Exploring Safer Behaviors for Deep Reinforcement Learning
نویسندگان
چکیده
We consider Reinforcement Learning (RL) problems where an agent attempts to maximize a reward signal while minimizing cost function that models unsafe behaviors. Such formalization is addressed in the literature using constrained optimization on cost, limiting exploration and leading significant trade-off between reward. In contrast, we propose Safety-Oriented Search complements Deep RL algorithms bias policy toward safety within evolutionary optimization. leverage benefits design novel concept of safe mutations use visited states explore safer actions. further characterize behaviors policies over desired specifics with sample-based bound estimation, which makes prior verification analysis tractable training loop. Hence, driving learning process towards regions space. Empirical evidence Safety Gym benchmark shows successfully avoid drawbacks return improving policy.
منابع مشابه
Deep Reinforcement Learning for 2048
In this paper, we explore the performance of a Reinforcement Learning algorithm using a Policy Neural Network to play the popular game 2048. After proposing a modelization of the state and action spaces, we review our learning process, and train a first model without incorporating any prior knwoledge of the game. We prove that a simple Probabilistic Policy Network achieves a 4 times higher maxi...
متن کاملOperation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm
: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...
متن کاملCollaborative Deep Reinforcement Learning
Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from dierent sources to assist the current learning goal. is collaborative learning procedure ensures that the knowledge is shared, continuously rened, and concluded from dierent perspectives to construct a more profound...
متن کاملDeep Reinforcement Learning
Combining deep model-free reinforcement learning with on-line planning is a promising approach to building on the successes of deep RL. On-line planning with look-ahead trees has proven successful in environments where transition models are known a priori. However, in complex environments where transition models need to be learned from data, the deficiencies of learned models have limited their...
متن کاملDeep Reinforcement Learning
In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and controlvariates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward sig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i7.20737